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Abstract 

Recently, the new Kinect One has been issued by Microsoft, providing the next 
generation of real-time range sensing devices based on the Time-of-Flight (ToF) 
principle. As the first Kinect version was using a structured light approach, one 
would expect various differences in the characteristics of the range data delivered 
by both devices. 

This paper presents a detailed and in-depth comparison between both devices. 
In order to conduct the comparison, we propose a framework of seven different 
experimental setups, which is a generic basis for evaluating range cameras such as 
Kinect. The experiments have been designed with the goal to capture individual 
effects of the Kinect devices as isolatedly as possible and in a way, that they can 
also be adopted, in order to apply them to any other range sensing device. The 
overall goal of this paper is to provide a solid insight into the pros and cons of 
either device. Thus, scientists that are interested in using Kinect range sensing 
cameras in their specific application scenario can directly assess the expected, 
specific benefits and potential problem of either device. 
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1. Introduction and Related Works 


In the last decade, several new range sensing devices have been developed 
and have been made available for application development at affordable costs. In 
2010, Microsoft, in cooperation with PrimeSense released a structured-light (SL) 
based range sensing camera, the so-called Kinect^^, that delivers reliable depth 
images at VGA resolution at 30 Hz, coupled with an RGB-color camera at the 
same image resolution. Even though the camera was mainly designed for gaming, 
it achieved great popularity in the scientific community where researchers have 
developed a huge amount of innovative applications that are related to different 
fields such as online 3D reconstruction usunmg, medical applications and 
health care nail], augmented reality [50], etc. Recently Microsoft released an 
update of their Kinect™ camera in the context of their next generation of console 
(XBox One) that is now based on Time-of-Flight (ToF) principle. 

Both range sensing principles, SL and ToF, are quite different and are subject 
to a variety of error sources (see Sec.[^. This paper is meant to deeply evaluate 
both Kinect™cameras, denoted as Kinect^^ and Kinect^®^ in the following, in 
order to extract their pros and cons which are relevant for any application 
incorporating this kind of device. Thus, we explicitly do not try to evaluate the 
devices with respect to a set of specific application scenarios, but we designed a 
set of seven different experimental setups as a generic basis for evaluating range 
cameras such as Kinect. 

Several studies can be found in the literature that compare and evaluate 
the depth precision of both principles. However, this work is the first study 
comparing both versions of the Kinect cameras and offering detailed descriptions 
under which conditions one is superior to the other. Since KinecF^cameras are 
targeting the consumer market and have known sales of several millions devices, 
we believe that our work will be valuable for a large number of follow-up research 
projects. 

Prior Work. A complete discussion on prior work in SL- and ToF-based range 
sensing would clearly go beyond the scope of this paper. Thus, we give a brief 
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and exemplary overview on related work in the context SL- and ToF-based range 
sensing and focus on papers that compare different range sensing approaches and 


devices. In Sec. 2.3 we further refer to some key papers that deal with specific 
characteristics of SL and ToF range data. Additionally, we refer the reader to 
the surveys of Berger et al. [3] and Han et al. on the Kinect^^ as well as to 
the survey on Time-of-Flight cameras by Kolb et al. [28] . 

Kuhnert and Stommel [30] demonstrate a first integration of ToF- and stereo 
cameras. Beder et al. [2] evaluate and compare ToF cameras to a stereo-vision 
setup. Both papers emphasize that ToF and stereo data are at least partially 
complementary and thus an integration significantly improves the quality of 
range data. Furthermore, the Kinect^®^ does not use triangulation for depth 
calculation, and thus it does not suffer much from occlusion. As it will be 
shown in Sec. |4.8| the occluded area in a static scene is around 5% compared to 
Kinect^^ which is around 20%. Besides Evangelidis et al. [11] has also used a 
ToF range camera, in comparison with Kinect^^, Kinect^®^ would be a better 
choice specifically to be utilized in depth-stereo approach. For further details 
on ToF-stereo fusion we refer the reader to Nair et al. [40]. In the domain of 
robotics, Wiedemann et al. m compare different ToF cameras from different 
manufacturers. They analyze the sensor characteristics of such systems and 
the application potential for mobile robots. In their work, they address several 
problems such as sensor calibration, automatic integration time and data filtering 
schemes for outliers measurements removal. Stoyanov et al. imiin] compare 
the accuracy of two ToF cameras and the Kinect^^ camera to a precise laser 
range sensor (aLRF). However their evaluation methodology does not take into 
account the different error sources given by real-time range sensing cameras. 
The follow-up work by Langmann et al. m compares a ToF camera (pmdtec 
CamCube 41k) with the Kinect^^. Lateral resolution of depth measurements are 
given using a three dimensional Siemens star-like shape. The depth linearity is 
also compared using precise linear rail. The authors conclude that both cameras 
have different drawbacks and advantages and thus are meant to be used for 
different applications. Meister et al. [38] discuss the properties of the 3D data 
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acquired with a Kinect^^ camera and fused into a consistent 3D Model using the 
so-called KinectFusion-pipeline m in order to provide ground truth data for 
low-level image processing. The “targetbox” scene used by Meister et al. [38] . 
also called “HCI Box”, consists of several object arranged in a 1 x 1 x 0.5m 
box. Nair et al. [39] discuss quality measures for good ground truth data as well 
as measurement and simulation approaches to generate this kind of data. We 
generally opted against this kind of ground truth scenery, as this approach does 
often not allow a proper separation of the individual error sources and, thus, it 
would be nearly impossible to transfer results to another application scenario. 

In their book about ToF cameras Hansard et al. [T8| compare between ToF 
cameras and the Kinect^^. Their comparison focuses on different material classes. 
They use 13 diffuse (“class A”), 11 specular (“class B”) and 12 translucent (“class 
C”) objects or object variants for which they acquire geometric ground truth 
using an additional 3D scanner and applying white matte spray on each object 
surface. As result, they provide root mean square error (RMSE) and standard 
deviation (SD). 

Compared to all prior work, in this paper we focus on a set of experimental 
setups handling an as complete as possible list of characteristic sensor effects and 
evaluate these effects for the Kinect^^ and the Kinect^°^ cameras presented in 
Sec.[^ Before presenting the experiments and results, we discuss the fundamental 
problem raised by any attempt to compare these devices in Sec. In Sec. 
we present our experiments, that are all designed in such a way that individual 
sensor effects can be captured as isolatedly as possible and that the experiments 
are reproducible for other range sensing cameras. 

2. Devices Principle 

2.1. Structured Light Cameras - Kinect^^ 

Even though the principle of structured light (SL) range sensing is compara¬ 
tively old, the launch of the Microsoft Kinect™ (Kinect^^) in 2010 as interaction 
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device for the XBox 360 clearly demonstrates the maturity of the underlying 
principle. 

Technical Foundations. In the structured light approach is an active stereo¬ 
vision technique. A sequence of known patterns is sequentially projected onto 
an object, which gets deformed by geometric shape of the object. The object 
is then observed from a camera from a different direction. By analyzing the 
distortion of the observed pattern, i.e. the disparity from the original projected 
pattern, depth information can be extracted; see Fig. 

Knowing the intrinsic parameters of the camera, i.e. the focal length f and 
additionally the baseline b between the observing camera and the projector, the 
depth of pixel {x, y) can be computed using the disparity value m{x,y) for this 
pixel as d = . As the disparity m{x^y) is usually given in pixel-units, the 

rmetric 

focal length is also converted to pixel units, i.e. / = — -, where Spx denotes 

the pixel size. In most cases, the camera and the projector are only horizontally 
displaced, thus the disparity values are all given as horizontal distances. In this 
case 5px resembles the horizontal pixel size. The depth range and the depth 
accuracy relate to the baseline, i.e. longer baselines allow for robust depth 
measurements at long distances. 

There are different options to design the projection patterns for a SL range 
sensor. Several approaches were proposed based on the SL principle in order 
to estimate the disparity resulting from the deformation of the projected light 
patterns. In the simplest case the stripe-pattern sequence realizes a binary code 
which is used to decode the direction from an object point is illuminated by the 
beamer. Based on this principle, Hall-Holt and Rusinkiewicz m introduced a 
real-time camera based 3D system. The authors show that they could achieve 
full 3D reconstruction of objects using an automatic registration of different 
rotated range maps. 

Zhang et al. m investigates the benefit of projection patterns composed 
of alternative color stripes creating color transitions that are matched with 
observed edges. Their matching algorithm is faster and eliminates the global 
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Figure 1: Principle of structured light based systems. 

smoothness assumptions from the standard SL matching algorithm. Similarly, 
Fechteler et al. m uses this color pattern to reconstruct at high-resolution human 
face using only two sequential patterns, which leads to a reduced computational 
complexity. 

Additionally, Zhang and Huang [56] proposes an high resolution SL camera 
based on the use of color fringes pattern and phase-shifting techniques. Their 
system was designed to capture and reconstruct at high frame rate (up to 40Hz) 
dynamic deformable objects such as human face. 

SL cameras, such as the Kinect^^, use a low number of patterns, maybe only 
one, to obtain a depth estimation of the scenery at a “high” frame rate (30 FPS). 
Typically, it is composed of an near infra-red (NIR) laser projector combined 
with a monochrome CMOS camera which captures depth variations of object 
surfaces in the scene. 

The Kinect^^ camera is based on the standard structured light principle 
where the device is composed of two cameras, i.e. a color RGB and a monochrome 
NIR camera, and an NIR projector including a laser diode at 850nm wavelength. 
The baseline between the NIR projector and the NIR camera is 7.5cm see Fig. [^ 
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Figure 2: Sensor placement within a Kinect^^ camera. The baseline is of approximately 7.5cm. 

The NIR projector uses a known and fixed dot pattern to illuminate the scenery. 

Simple triangulation techniques are later on used to compute the depth 
information between the projected pattern seen by the NIR camera and the 
input pattern stored on the unit. For each pixel depth is estimated by 
finding the best correlation pattern patch, typically in a 9 x 9 pixel window, 
on the NIR image with the corresponding projection pattern. The disparity 
value is given by this best match. Note that the Kinect^^ device performs 
internally an interpolation of the best match operation in order to achieve 
sub-pixel accuracy of | pixel. A detailed description of the Kinect disparity 
map computation can be found at the ROS.org community website [29], where 
the Kinect^^’s disparity map computation has been reverse engineered and a 
complete calibration procedure is deduced. 

2.2. Time-of-Flight (ToF) Cameras 

The ToF technology is based on measuring the time that light emitted by an 
illumination unit requires to travel to an object and back to the sensor array [32] . 
In the last decade, this principle has found realization in microelectronic devices, 
i.e. chips, resulting in new range-sensing devices, the so-called ToF cameras. 
Here, we will explain the basic principle of operation of ToF-cameras. It should 
be noted that for the specific device of the new Kinectcamera, issued by 
Microsoft Corp. in conjunction with the XBox 360 game console, only little 
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Figure 3: The ToF phase-measurement principle. 


technical detail is known. 

The Kinect^°^ utilizes the Continuous Wave (CW) Intensity Modulation 
approach, which is most commonly used in ToF cameras. The general idea is 
to actively illuminate the scene under observation using near infrared (NIR) 


intensity-modulated, periodic light (see Figure]^. Due to the distance between 


the camera and the object (sensor and illumination are assumed to be at the 
same location), and the finite speed of light c, a time shift 0[s] is caused in the 
optical signal which is equivalent to a phase shift in the periodic signal. This 
shift is detected in each sensor pixel by a so-called mixing process. The time 
shift can be easily transformed into the sensor-object distance as the light has 
to travel the distance twice, i.e. d = ^. 

From the technical perspective, the generator signal driving the illumina¬ 
tion unit results in the intensity modulated signal which, after being reflected 
by the scene, results in an incident optical signal s^^^ on each sensor pixel. Note, 
that the optical signal may be deformed by nonlinear effects e.g. in the LEDs of 
the illumination unit. The incident signal s^^^ is correlated with the reference 
generator signal g^^^. This mixing approach yields the correlation function which 


is sampled in each pixel 
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The phase shift is computed using several correlation measurements with varying 
illumination and reference signals and respectively, using some kind of 
demodulation function, i.e. 

(/) = Q(Ao,Ai,.. .,An), with Ai = C[gf,gl''^],i = 

Frequently, Ai is called phase image or eorrelation image. We will use the 
latter notation in order to prevent confusion with the phase shift (oc distance). 
Practically, the correlation images are acquired sequentially, however there is 
the theoretic option to acquire all correlation images in parallel, e.g. by having 
different phase shifts for neighboring pixels. Note, that due to the periodicity of 
the reference signal, any ToF-camera has a unique unambiguous measurement 
range. 

The first ToF cameras like the prototypes from pmdtechnolgies m used 
sinusoidal signals = cos(27r/^t) with a constant modulation frequency 

fm and a reference signal equal to g^^^ with an additional phase offset r, i.e. 
g^^^{t) = + r). For this approach, usually four correlation images Ai = 

C[cos{27rfm cos(27r/^ • Tr^)] for n = i • ^/ 2 , i = 0,1, 2,3 are acquired leading 
to a distance value of 

(f) = Q{Aq, Ai,A2, As) = arctan2(A3 - ^i, Aq - 

where arctan2(^, x) is the angle between the positive x-axis and the point given 
by the coordinates {x^y). 

The Kinect^°^ camera applies this CW intensity modulation approach [5^ . 
Blake et al. [5] reverse engineered the Kinect^°^-driver. This revealed, that the 
Kinect^®^ acquires 10 correlation images, from which nine correlation images are 
used for a three-phase reconstruction approach based on phase shifts of 0°, 120° 
and 240° at three different frequencies. Using multiple modulation frequencies 
the measurement range can be exceeded [9] Although the Kinect^°^ camera can 
obtain depth values for distances longer than 9 meters, the official driver masks 
the distances further than around 4.5 meters. 

The purpose of the tenth correlation image is still not clear. Even though 
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the technical specifics of the Kinect^°^ have not been explicitly revealed by 
Microsoft, it definitely applies the basic principle of correlation as described 
above. The illumination unit consists of a laser diode at 850nm wavelength. 

In Sec. we discuss further technical details regarding the Kinect^°^ driver. 

2.3. Error Sources for Kinect^^ and Kinect^^^ 

SL and ToF cameras are active imaging systems that use standard optics to 
focus the refiected light onto the chip area. Therefore, the typical optical effects 
like shifted optical centers and lateral distortion need to be corrected, which 
can be done using classical intrinsic camera calibration techniques. Beyond this 
camera specific calibration issues, SL and ToF cameras possess several specific 
error sources, which are discussed in the following and which also apply to 
Kinect^°^ and/or Kinect^^. As a detailed discussion of prior work in relation 
to these error sources would go beyond the scope of this paper, we only give 
some relevant links to prior work that relates to the individual effects for either 
system. 

Ambient Background Light [SL, ToF]: As any other camera, ToF and 
SL cameras can suffer from ambient background light, as it can either lead to 
over-saturation in case of too long exposure times in relation to the objects’ 
distance and/or reffectivity, e.g. causing problems to SL-systems in detecting 
the light pattern. Both, the Kinect^°^ and the Kinect^^ are utilized with a 
band-pass filter, suppressing background light out of the range of the illumination. 
Kinect^®^ provides a suppression of background intensity on the chip. 

For ToF cameras specific circuitry has been developed, e.g. the Suppression 
of Background Intensity approach for PMD cameras [44] that electronically filter 
out the DC-part of the light. For SL systems outdoor application is usually hard 
to achieve, which has also been stated for the Kinect^^ m- 

Multi-Device Interference [SL, ToF]: Similar to any other active sensing 
approach, the parallel use of several Kinect cameras may lead to interference 
problems, i.e. the active illumination of one camera infiuences the result of 
another camera. 


10 


For Kinect^^ the potential interference problem given by multiple NIR 
patterns projected into the scene is very difficult to solve. Butler et al. [7] propose 
a “Shakeffi’Sense” setup where one (or each) Kinect^^-device is continuously 
shaken using an imbalanced rotating motor. Thus, the projected pattern performs 
a high frequency motion that appears significantly blurred for another device. 
An alternative approach is introduced by Berger et al. [4]. They add steerable 
hardware shutters to the Kinect^^-devices’ illumination units resulting in a 
time-multiplex approach. For ToF cameras the signal shape can be altered 
in order to prevent multi-device interference, e.g. for sinusoidal signal shapes 
different modulation frequencies can simply be used to decouple the devices [27]. 

Temperature Drift [SL, ToF]: A common effect to many technical devices 
is the drift of the system output, i.e. the distance values in the case of Kinect 
cameras, during the device warm-up. The major difference between the SL and 
the ToF approach is that an SL camera usually does not produce as much heat 
as a ToF camera. This is due to the fact, that the required illumination power 
to cover the full scene width and depth in order to get a sufficient signal-to-noise 
(SNR) for the optical signal for a ToF camera is beyond the power needed to 
generate the relatively sparse point-based pattern applied by the Kinect^^. As 
a consequence, the Kinect^^ can be cooled passively whereas the Kinect 
requires active cooling. 

For the Kinectsignificant temperature drift has been reported by Fiedler 
and Muller [T4|. Early ToF-camera studies e.g. from Kahlmann et al. [24| of 
the Swissranger™ camera exhibit the clear impact of this warm-up on the range 
measurement. More recently smaller ToF cameras for close range applications 
such as the camboard-nano series provided my pmdtechnolgies do not require 
active cooling, however, no temperature drift investigations have been reported 
so far. 

Systematic Distance Error [SL, ToF]: Both Kinect cameras suffer from 
systematic error in their depth measurement. For the Kinectthe error is mainly 
due to inadequate calibration and restricted pixel resolution for estimation of 
the point locations in the image plane, leading to imprecise pixel coordinates of 
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the reflected points of the light pattern m- Further range deviations for the 
Kinect^^ result from the comparably coarse quantization of the depth values 
which increases for further distances from the camera. For Kinect^°^, on the 
other hand, the distance calculation based on the mixing of different optical 
signals s with reference signals requires either an approximation to the 
assumed, e.g. a sinusoidal signal shape or an approximation to the phase 
demodulation function Q. Both approximations lead to a systematic error in the 
depth measurement. In case of an approximated sinusoidal shape this effect is 
also called “wiggling” (see Figure]^ top left). The systematic error may depend 
on other factors, such as the exposure time. 

For Kinect^^ Khoshelham and Elberink [26] present a detailed analysis of 
its accuracy and depth resolution. They conclude that the systematic error is 
below some 3cm, however it increases on the periphery of the range image and 
for increasing object-camera distance. Smisek et al. 03 present a geometric 
method to calibrate the systematic error of the Kinect^^. Herreta et al. m 
proposed a joint calibration approach for the color and the depth camera of the 
Kinect^^. Correction schemes applied to reduce the systematic error of ToF 
cameras with sinusoidal reference signals simply model the depth deviation using 
a look-up-table [23] or function fitting, e.g. using b-splines [33] . 

Depth Inhomogeneity [SL, ToF] At object boundaries, a pixel may 
observe inhomogeneous depth values. Due to the structured light principle, 
occlusion may happen at object boundaries where parts of the scene are not 
illuminated by the infra-red beam which results in a lack of depth information 
in those regions (invalid pixels). For ToF cameras, the mixing process results in 
a superimposed signal caused by light reflected from different depths, so-called 
mixed pixels. In the context of ToF cameras these pixels are sometimes called 
flying pixels. The mixed, or flying signal leads to wrong distance values; see 
Figure]^ top right. 

There are simple methods relying on geometric models that give good results 
in identifying flying pixel, e.g. by estimating the depth variance which is 
extremely high for flying pixel [45]. Denoising techniques, such as a median filter. 
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Figure 4: Error sources of ToF cameras. Top left: Systematic (wiggling) error for all pixels 
(gray) and fitted mean deviation (black). Top right: Motion artifacts (red) and flying pixels 
(green) for a horizontally moving planar object in front of a wall. Bottom left: Schematic 
illustration of multi-path effects due to reflections in the scene. Bottom right: Acquisition of a 
planar gray-scale checkerboard reveals the intensity related distance error. (Image courtesy: 
EB], Eurographics Association, 2010.) 

can be used to correct some of the flying pixels. 

Note that flying pixels are directly related to a more general problem, i.e. 
the multi-path problem; see below. 

Multi-Path Effects [SL, ToF] Multi-path effects relate to an error source 
common to active measurement systems: The active light may not only travel 
the direct path from the illumination unit via the object’s surface to the detector, 
but may additionally travel indirect paths, i.e. being scattered by highly reflective 
objects in the scene or within the lens systems or the housing of the camera 
itself, see Fig. [^bottom left. In the context of computer graphics this effect 
is known as global illumination. For ToF cameras these multiple responses of 
the active light are superimposed in each pixel leading to an altered signal not 
resembling the directly reflected signal and thus a wrong distance. For Kinect^^ 
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indirect illumination mainly causes problems for highly reflecting surfaces, as 
dots of the pattern may be projected at other objects in the scene. However, 
objects with a flat angle to the camera will lead to a complete lack of depth 


information (see also Sec. 4.7). 

For ToF cameras several correction schemes for multi-path effects have been 
proposed for sinusoidal signal shapes. Falie and Buzuloiu m assume that the 
indirect effects are of rather low spatial frequency and analyze the pixel’s neigh¬ 
borhood to detect the low-frequency indirect component. Dorrington et al. [8] 
present an analytic formulation for the signal superposition resulting in a non¬ 
linear optimization scheme per pixel using different modulation frequencies. 

Intensity-Related Distance Error [ToF] Considering a highly reflecting 
object and a second object with the same distance to the camera but with low 
reflectivity in the relevant NIR range, a reduced SNR is expected. Beyond this, it 
has frequently been reported that ToF cameras have a non-zero biased distance 
offset for objects with low NIR reflectivity (see Figure]^ bottom right). 

Lindner et al. [35] tackle the specific intensity-related error using phenomeno¬ 
logical approaches. In general, there are at least two possible explanations 
for this intensity-related effect. The first assumption explains this effect is a 
specific variant of a multi-path effect, the second one puts this effect down to 
the non-linear pixel response for low amounts of incident intensity. 

Semitransparent and Scattering Media [SL, ToF] As for most active 
measuring devices, media that does not perfectly reflect the incident light 
potentially causes errors for ToF and SL cameras. In case of ToF cameras, light 
scattered within semitransparent media usually leads to an additional phase 
delay due to a reduced speed of light. 

The investigations done by Hansard et al. m give a nice overview for specular 
and translucent, i.e. semitransparent and scattering media for ToF cameras 
with sinusoidal reference signal and the Kinect^^. Kadambi et al. [22| show 
that their coding method (originally designed to solve multi-path errors for 
ToF cameras) is able to recover depth of near-transparent objects using their 
resulting time-profile (transient imaging). Finally, a detailed state-of-the-art 
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report is given by Ihrke et al. m where different methods are described in order 
to robustly acquire and reconstruct such challenging media. 

Dynamic Scenery [SL, ToF] One key assumption for any camera-based 
system is that each pixel observes a single object point during the whole acquisi¬ 
tion process. This assumption is violated in case of moving objects or moving 
cameras, resulting in motion artifacts. In real scenes, motion may alter the true 
depth. Even though Kinect^^ acquires depth using only a single NIR image 
of the projected pattern, a moving object and/or camera leads to improper 
detection of the pattern in the affected region. ToF cameras as the Kinect^°^ 
require several correlation images per depth image. Furthermore, their correla¬ 
tion measurements get affected by a change of reflectivity observed by a pixel 
during the acquisition. Processing the acquired correlation images ignoring the 
motion present during acquisition leads to erroneous distance values at object 
boundaries (see Figure]^ top right). 

However, no real investigations have been done yet for the Kinect^^ to study 
the effect of motion blur on the depth measurement quality. Nevertheless the 
work of Butler et al. [7] uses the motion blur property to solve the problem of 
multiple Kinect^^ devices interference. 

For ToF cameras several motion compensation schemes have been proposed. 
Schmidt and Jahne [46] detect motion artifacts using temporal gradients of the 
correlation images i.e. a large gradient in one of the correlation images 
indicates motion. This approach also performs a correction using extrapolated 
information from prior frames; see also discussion in Hansard et al. [T8|, Sec. 1.3.3. 
Since motion artifacts result from in-plane motion between subsequent correc¬ 
tion images, several approaches use optical flow methods in order to re-align 
the individual correlation images. Lindner and Kolb [34] apply a fast optical 
flow algorithm IMl three times in order to align the four correlation images 
Ho,Hi,H 2 ,H 3 to the first correlation image Aq. As optical flow algorithms 
are computationally very expensive, these approaches significantly reduce the 
frame rates for real-time processing. A faster approach is motion detection and 
correction using block-matching techniques applied pixels where motion has been 
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detected m- 


3. General Considerations for Comparing Kinect^^ and Kinect^®^ 

Before presenting the experimental setups and the comparison between the 
two Kinect devices, we have to consider the limitations which this kind of 
comparison encounters. For both, the Kinect^^ and the Kinectcamera, there 
are no official, publicly available reference implementations which explain all 
stages from raw data acquisition to the final range data delivery. Thus, any 
effect observed may either relate to the sensor hardware, i.e. to the measurement 
principle as such, or to the algorithms applied to raw data or, in a post-processing 
manner, to the range data which are integrated in the camera systems. 

Anticipating the further discussion in this section, we explicitly opted to work 
with both Kinect cameras in a “black box” manner using the official drivers, as 
it is impossible to achieve “fair conditions” for the comparison, i.e. a comparison 
which neutralizes the effects from diverse filters applied in range data processing. 
This is mainly due to the fact, that data processing is applied on the level of raw 
data, i.e. disparity maps or correction images, as well as on the level of range 
data; see detailed discussion below. Attempts to reverse engineer the processing 
functionality usually do not lead to the same data quality; see below. Thus, 
taking the devices as they are, including the official, closed-source drivers, is the 
most appropriate approach from the perspective in utilizing them for any kind 
of application. 

However, the disparity map from the Kinect^^ is different from common 
representation, i.e. 0 disparity value does not refer to an infinite distance. 
According to the reverse engineered disparity map computation from ROS.org, 
the disparity map is normalized and quantized between 0 and 2047 (using 
11 bits storage), that requires a more complex mapping function in order to 
convert disparity into depth values. Note, that the quantization of the disparity 
map leads to quantization of range values, which in some cases negatively 
influences the statistical analysis or, in some cases, make it completely useless. 
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Figure 5: Statistics of 200 frame for Kinect^^ and Kinect"^®^ acquiring a planar wall (values 
in mm): Mean (left col.), standard deviation (middle col.) and RMSE with respect of a fitted 
plane (right) for the Kinect^^ (official driver, row, post-filtered range images, 2"^^ row) and 
for the Kinect^®^ (official driver, row, the re-engineered OpenKinect driver, 4^^ row, and 
the raw range data delivered by the OpenKinect driver, 5^^ row). 
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Kinect^°^ Raw 


Figure 6: Single depth frame for a Siemens star for Kinect^^ and Kinect"^®^, range in mm: 
The range images are acquired for the static (left image) and the rotating star (60 RPM, right 
image) for Kinect^^ (official driver, top left, and post-filtered range images, top right) and for 
the Kinect^®^(official driver, middle left, the re-engineered OpenKinect driver, middle right, 
and the raw range data delivered by the OpenKinect driver, bottom). White color indicates 
invalid pixels. 


For example, it is impossible to derive a per-pixel noise model for the Kinect^^ 
taking only individual pixel distance measurements of a static scene; see Sec. |4.1| 
and Nguyen et al. m- 

Different alternatives have been proposed for depth value estimation for 
Kinect^^ disparity maps [26]. In general, it is possible to access the raw data of 
the Kinect^^ camera, i.e. infrared image of the scene with the dots pattern, but 
it would go far beyond the scope of this paper to provide further insight into 
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the Kinect^^’s method of operation by reverse engineering. On the other hand, 
solely post-processing the delivered range data hardly improves the quality; see 
below. 


As described in Sec. 2.2 the Kinect^°^ camera applies the CW approach. 


Additionally, the reverse engineered OpenKinect driver [5] gives insight into 
some details of data processing applied in the Kinect^°^in a first processing 
stage, the correction images are converted to intermediate images. At this stage 
a bilateral filter is applied. In a second stage, the final range data is computed 
out of the intermediate images, joining the three different range values retrieved 
from the three frequencies. At this level, an outlier (or flying pixel) removal is 
applied. The OpenKinect driver allows to deactivate the two filters, thus the 
delivered range data can be considered as being based raw correction images. 

The described functionality allows data access on several levels, i.e. 


• Kinect^^ Offic and Kinect^^^ Offic: Range data as delivered by the official 
driver provided by Microsoft for the Kinect^^and for the Kinect^®^ using 
the Developer Preview driveiQ 

• Kinect^^ Post: Additional post-processing using filtering; here we use a 
bilateral filter. Note, that the filter has to operate on data with already 
masked out, i.e. invalid pixels. 


• Kinect^^^ Open: The reengineered data processing of the OpenKinect 
driver by Blake et al. [5]. 

• Kinect^^^ Raw: The reengineered data processing of the OpenKinect 
driver by Blake et al. [5] with deactivated filtering, i.e. range data directly 
computed from the raw data. 


We apply these five different options to a simple static scene, where the 
cameras observe a planar wall, analyzing the statistics for 200 frames; see Fig. 


^Kinect For Windows SDK 1.8 

^Kinect For Windows SDK 2.0 (JuneSDK) 
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and Sec. 4.8). For this scenario, the data is comparable among different drivers of 
a device, as the cameras have not been moved while switching to a different driver. 
However, the data is not fully comparable between Kinect^^ and Kinect^°^. 
Additionally, we used a dynamic scenery with a rotating Siemens star; see Fig.[^ 


and Sec. 4.8). 

The results for the static wall and the Siemens star are presented in Figs. 
and[^ respectively. The results can be summarized as follows: 


• Post-processing the Kinect^^-data does not improve the quality, as the 
problematic regions of the range image are already masked out; see Fig. 
top row. The quality of the Kinect^^ device is mainly driven by strong 
depth quantization artifacts, which get apparent in the standard deviation; 
see Fig. middle column, first two rows. 


• The quality of the OpenKinect driver [5] stays somewhat behind the official 
Kinect^°^-driver; see Fig[^ 3^*^ and 4^^ row, i.e. the reverse engineering 
appears to be functionally not fully complete. 


• Disabling the internal filters for the Kinect^®^ mainly shows negative effects 
for the rotating Siemens star; see Fig. The filtering of the correction 
images and the fiying pixel removal clearly removes the artifacts at the 
jumping edges of the Siemens star. 


4. Experimental Results And Comparison 

In section [4^4. 8 1 we present the different test scenarios we designed in order 
to capture specific error sources of the Kinect^^ and the Kinect^°^-cameras. 
Before going into the scenarios, in Sec. |4.1| we will briefly present the camera 
parameters and the pixel statistics. 

Our major design goal for the test scenarios was to capture individual effects 
as isolatedly as possible. Furthermore, we designed the scenarios in a way, that 
they can be reproduced in order to adopt them to any other range sensing 
system that works in a similar depth range. Tab. gives an overview of the 
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Test-Scenarios\Effect 

Amb. Backgr. Light 

Multi-Device Interf. 

Temperature Drift 

Systematic Error 

Depth Inhomogeneity 

Multipath Effect 

Intens.-Rel. Error 

Semitrans. Scatter. 

Dynamic Scenery 

Ambient Background Light 


• 








Multi-Device Interference 

• 

o 








Device Warm-Up 



• 







Rail Depth Tracking 




• 



• 



Semitransparent Liquid 






o 


• 


Reflective Board 






• 




Turning Siemens star 





• 




• 


Table 1: The different effects relevant SL- and ToF-based range sensing systems and their 
relation to the designed test scenarios. Each test addresses primarily one or two separable 
effects denoted by • and may address also secondary effects, denoted by o. 
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different test scenarios and the effects they address; see also Sec. |2.3[ We focus 
on the range of 500nini to 3000mn|^as we operate the Kinect^^ in the so-called 
near-range-mode, which is optimized for this depth range. Also it covers the 
depth range supported by Kinect^°^ which is 500mm to 4500mn|^ 

For all tests we utilize a Kinect^^ (Kinect for Windows vl sensor with 
activated near mode) and a Kinect(Microsoft camera prototype available 
from the Developer Preview Program). Data access for the Kinect^^ is done 
via the official driver provided by Microsoflj^ and for the Kinectusing the 
Developer Preview driveij^ All data evaluations have been done using Matlab. 

The major quantitative results for the comprehensive comparison are sum¬ 
marized in Tab. [^indicating the major differences, strengths and limitations of 
both systems. 

At this point, we want to refer to the discussion in Sec.j^state explicitly, that 
both Kinect cameras are used in a “black box” manner. Thus, even though we 
refer to characteristics of the specific range measurement techniques, the resulting 
effects may not only relate to the sensor hardware, i.e. to the measurement 
principle as such, but also to the post-processing integrated into the cameras 

4 . 1 . Camera Parameters and Noise Statisties 

As most applications require full 3D information, we first estimate the intrinsic 
parameters for both devices using standard calibration techniques based on a 
planar checkerboard from the OpenCV library [^; see Tab.|^ For both devices, 
50 images of the checkerboard were acquired with different orientations and 
distances. For the Kinectwe directly use the amplitude image delivered by 
the camera. Whereas for the Kinect^^, we use the NIR image of the depth sensor. 
Since the dot pattern of the Kinect^^ may degrade the checkerboard detection 
quality in the NIR image, we block the laser illumination and illuminate the 


^Microsoft Developer network, Kinect sensor 
"^Kinect for windows, features 
"Kinect For Windows SDK 1.8 
^Kinect For Windows SDK 2.0 (JuneSDK) 
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Parameter 

Kinect^^ 

Kinect^°^ 

Resolution 

Focal Length [px] 

Center [px] 

Dist. (fci; ^ 2 ; h; pi; P2) 

640x480 

(583.46, 583.46) 

(318.58, 251.55) 

-0.07377, 0.1641, 0, 0, 0 

512x424 

(370.79, 370.20) 

(263.35, 202.61) 

0.09497, -0.2426, 0, 

0.00076, -0.00017 


Table 2: Camera parameters of the depth sensors for Kinect^^ and Kinect"^®^. The distortion 
coefficients are radial {ki,k 2 :k 3 ) and tangential distortion (pi,p 2 )- 

checkerboard with an ambient illumination. 

Furthermore, we want to analyze the noise statistics for the Kinect^^ and 
the Kinect^®^. As already stated in Sec. the strong quantization applied in 
the Kinect^^ makes it hard to derive per-pixel temporal statistic values. In 
the literature there are alternative approaches using a mixed spatiotemporal 
analysis to derive some kind of noise statistics [42], but this approach is difficult 
to compare with pure temporal statistics. Therefore, we focus on the Kinect^°^’s 
temporal statistics only. 

For the temporal statistics we acquired 5000 frames of the Kinect^°^ observing 
a planar wall at about 1 meter distance. The OpenKinect driver with deactivated 
filtering was used to obtain unchanged range data. Fig. shows the histograms 
for a central, an intermediate and a corner pixel of this time series including fits 
for a Gaussian and a Poisson distribution. Both fits were done using MATLAB. 
We use non linear least square optimization approaches in order to get the suitable 


Pixel 

Gaussian (/i, a) 

RMSDg 

Poisson (A, (5a,) 

RMSDp 

Center 

[1023.91, 4.42] 

0.0025 

[17.47, 1007.47] 

0.0024 

Intermed. 

[1074.77, 3.77] 

0.0017 

[14.10, 1061.40] 

0.0017 

Corner 

[1127.21, 24.03] 

0.0019 

[101.73, 1030.02] 

0.0019 


Table 3: Temporal statistics of three different pixels of Kinect"^®^ sensor with the corresponding 
Gaussian and Poisson fits. The value 6x [mm] denotes the shift applied range values to match 
the Poisson distribution. 


23 












Center Pixel 


Intermed. Pixel 


Corner Pixel 



Figure 7: Density distribution of 3 different pixels during 5000 frames for Kinect^®^ acquiring 
a planar wall (values in mm) 

parameters for the Poisson distribution. Tab. gives the resulting parameters 
of both fitting for the three pixel statistics as well as the corresponding RMSE. 
It can be noted that corner pixels have a higher variance than pixel at the 
center area of the image, which is due to a reduced amplitude of the illumination 
in corner regions. We can also deduce that the Poisson distribution and the 
Gaussian fitting results in the same fitting quality. 

4 . 2 . Ambient Background Light 

Goal. This test scenario addresses the influence of ambient light onto the range 
measurement of the Kinect^^ and the Kinect^®^ cameras. The primary goal for 
the experiment is to show the relation between ambient background radiance 
incident to the Kinect and the delivered depth range of the sensor. The main 
focus for this experiment is thus to measure the incident background radiance 
with respect to image regions accurately. 

As both Kinect cameras do have imperfect illumination in the sense, that 
pixels in the vicinity on the image receive less active light than pixels close to the 
center, a secondary goal is to given some insight into a possible spatial variation 
of the influence of ambient background light. 

Experimental Setup. The Kinect camera is mounted 1200mm in front of a 
approximately diffuse white wall in an environment where the amount of light 
can be controlled using three HALOLINE ECO OSRAM 400W halogen lamps. 
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Kinect Camera 


Laser Pointer 


Powerm^^e: 


Astrjpn«| 

C^merar 


Figure 8: Ambient background light experiment setup. The laser pointer is mainly hidden by 
the pipe. 


The radiosity on the wall depends on the number of active lamps and their 
distance to the wall. We measure the radiant emittance of the surface resulting 
from our light sources with a Newport 818-SL powermeter. The powermeter 
directly delivers power intensity in W/cm^ and it is calibrated to 850nm, which 
relates to the Kinect’s laser diode illumination of 850nm. An additional laser 
pointer allows for the directional adjustment of the powermeter’s pipe to a point 
on the wall in order to accurately measure the radiance. To register the point 
at the wall with a pixel in the Kinect camera, we temporally attached a small 
cuboid as depth marker to the wall. 

As both Kinect cameras have an NIR-filter suppressing visible lighi[^ we 
equip the powermeter with an additional Kodak Wratten high-pass filter number 
87C. This filter has a 50% cutoff at 850nm. A pipe is mounted to the powermeter 
in order to measure the incident radiance for a specific spatial direction from 

^We did not explicitly measure the NIR filter, as this would require to destroy the Kinect 
camera. 
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single point on the wall. 

We further add an Atik 4000 astronomy photography camera equipped with 
the same NIR filter alongside with a powermeter in order to verify radiance 
measurements provided by the powermeter setup. The astronomy camera 
measures the radiant emittance in a linear relation to the number of photons 
received per pixel, i.e. per observed direction. In our experiments we found a 
proper linear relation between both measurements. 

We interrelate the radiance measurement of the powermeter to daylight 
condition. Therefore, we acquired a radiant flux density reference measurement 
with the powermeter setup without pipe of direct sunlight on a sunny summer 
day in central Europe. This results in llmW/cm^. Furthermore, we relate the 
radiant flux density measurement to the incident radiance measurement of an 
indirectly illuminated diffuse white paper using the powermeter with pipe at 
1.2m distance resulting in a factor of 1.1 • 10^. As the later setup is comparable 
to the evaluation setup for the Kinect cameras, we can deduce a sun reference 
incident radiance value of about 10/iW/cm^. 

The final radiance measurements are done with the powermeter setup. The 
radiance measurements take place when the Kinect camera is turned off in order 
to prevent interference with the camera’s illumination unit. We acquired 200 
frames for various light conditions up to 20/iW/cm^. Since we expect some 
variation of the effect for different pixel locations, we measured three points 
along the half diagonal, i.e. a point close to the upper left corner, the principle 
point of the range image and one intermediate point in between both points. 


Evaluation and Results. We apply a variance analysis for the K = 200 frames 
of range values Di{u,v), i = K delivered by each camera for each of the 

pixels (i4, v) (center, intermediate, corner) by computing the Standard Deviation 
(SD) over time 


SD = 


\ 


K 


-^2^{Di{u,v) 

i=l 




1 ^ 

'K ^;))^ AK v), (1) 


26 






Light power per surface unit (w/cm^) 




Light power per surface unit (w/cm^) 


Light power per surface unit (w/cm^) 


Figure 9: Kinect Comparison for Ambient Background Light. The SD (top row) and distance 
statistics (bottom row) for the center, intermediate and corner pixel for Kinect^^ (left) and 
the Kinect(right). 

and plot this as function over the ambient light intensity; see Fig. top. 
Additionally, Fig. bottom, shows explicit distance values including box plots 
as function over ambient light. 

It can be observed, that the Kinectis not able to handle background light 
beyond 1/iW, whereas the Kinectdelivers range data throughout the full 
range of ambient light applied in the experiment. 

The Kinect^^ delivers more robust depth values than the Kinectthrough¬ 
out the ambient background light range where valid data is delivered. All 
observed pixels are below 6mm SD and the max. variation from the median 
is 25mm for the corner pixel. The SD and the box plots show, that for the 
Kinectthe depth variation is hardly effected by the ambient light, as long as 
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valid range data is delivered. The plots for the different pixels show, that the 
variation increases for pixels closer to the image vicinity. 

The Kinect^°^, at the other hand, shows the expected raise in the depth 
variation for increasing ambient light due to a reduced SNR. Whereas the center 
and the intermediate pixel show similar SD below 6/iW as the Kinect^^, i.e. 
below 4mm, the box plots reveal a larger number of outliers compared to the 
Kinect^^. However, the Kinect^°^’s corner pixel delivers worse SD and quantile 
statistics than the one for the Kinect^^. In the range beyond 10/iW ambient light, 
the variation increases to some 22,12 and 42mm for the center, intermediate 
and corner pixel, respectively. The effect that the center pixel gets worse than 
the intermediate pixel may be explained by oversaturation effects solely due to 
the active illumination of the Kinect^°^. 


4 . 3 . Multi-Device Interference 

Goal. This experiment addresses the problem arising from the interference of 
the active illumination between several Kinect-cameras of the same type, when 
running them in parallel. Primarily, we want to evaluate the influence of the 
interference on the range measurement. Secondarily, we want to gain some 
insight into the temporal and spatial distribution of the artifacts. 


Note, that in contrast to other ToF-cameras (see Sec. 2.3) we are not able 
to modify the modulation frequencies for the Kinect^°^ in order to reduce or 
suppress multi-device interference artifacts. 



Figure 10: Multi-device interference experiment setup 
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Figure 11: Multi Device Interference: Error for the static scene for the Kinect^^ with moving 
interference device (left) and for the static Kinect^®^ (right) as boxplot statistics for the 
interference situation (top row) and RMSE plot with and without interference including invalid 
pixel counts for the interference setup (bottom row). 

Experimental Setup. The general idea of the experiment is to acquire an ap¬ 
proximately diffuse white planar wall, adding a second Kinect device of the 
same kind as interference over a longer period of time. As the Kinect^^ uses a 
static structured light pattern, a fully static setup may not capture the overall 
interference. As circumventing interference for the Kinectmay not always be 
possible with a “Shake’n’Sense”-like approach [7], we investigate the influence of 
the camera poses of the two devices on the interference. Thus, for the Kinect^^ 
setup, we mount the interfering device on a turntable and rotate it ±10° about 
the vertical axis with IRPM in order to get a variation of the overlay of the 
SL-patterns of the two devices. The angular speed is low enough to prevent any 
motion artifacts. We also investigated different inter-device distances, but the 
resulting impact on the interference was comparable. The Kinectsetup the 
interfering device is always static. The distance between the wall and the Kinect 
was set to 1.2m and the distance between the devices is 0.5m; see Fig. We 
do not take the exact orientation of the measuring and the interference devices 
into account, the measuring and the interfering device, but both devices observe 
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approximately the same region on the wall. 


Evaluation and Results. In order to account for a potential misalignment of the 
Kinect towards the wall, we use a RANSAC plane fit to the range data with 
inactive interference device. The per-pixel ranges values, deduced from this plane 
fit, are considered as reference distances We compute the deviation 

for each frame Di in respect to the reference distance as Root-Mean-Square 
Error (RMSE), i.e. 


RMSE = 


\ 


^ n m 


m • n 


( 2 ) 


U=1 V = 1 


where n, m represent the width and height of the averaged range image, respec¬ 
tively. 

the active frequency pattern of the Kinect 


As can be seen in Eig. 
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has a stronger interference than the structured light pattern for the Kinect^^ 
for most poses of the interfering camera. On average, the Kinect^^ shows little 
interference effect (RMSE <5.6mm), beside some very prominent poses (RMSE 
up to < 9.4mm). Eig. mid-right, shows a sample range image with a high 
RMSE. The Kinectcamera shows low interference for the majority of the 
frames (RMSE: < 5mm), but extreme interference errors for some 25% of the 
frames (RMSE up to 19.3mm) that occur in a sequence which has a nearly 
constant repetition rate. This behavior is most likely due to the asynchronous 
operation of the two devices. A signal drift over time between the signals 



Figure 12: Sample range images for the multi-device interference setup: The Kinectrange 
image for the initial pose of the interfering device, left, and for two pose with a high invalid 
pixels count, mid-left, and a high RMSE, mid-right. A Kinectrange image with high 
invalid pixel count and high RMSE, right. 
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generated in both devices would lead to a repetitive interference pattern as 
the one observed. The range statistics represented in Fig. m top, shows that 
the median in Kinect^^ is not altered by interference, which is mainly due to 
the strong quantization applied in the disparity maps. In phases of maximum 
interference, the Kinect^®^ delivers increased drift of the median, up to 5mm, 
and a stronger variation. 

Regarding the invalid pixels, the Kinect^^ nearly always delivers invalid 
pixels. For the initial pose, we find some 1.5% invalid pixels; see Fig. left. 
While changing the pose of the interfering device, we find up to 16.3% invalid 
pixels; see Fig. 12 mid-left. The Kinect^°^ does not deliver invalid pixels in the 


non-interfered periods, but in the interference periods up to 22.7% of invalid 
pixels have been observed; see Fig. right. 

We want to point out that we always observe strong variations within the 
first 400 frames, i.e. the first 13 seconds after starting the acquisition with 
the Kinect^^. In this experiment we have an increased RMSE of up to 6.7mm 
without interference and 7.7mm with interference. Therefore, it is advisable to 
not use this initial sequence captured with the Kinect^^. 


4.4- Device Warm-Up 

Goal. This test scenario is designed to evaluate the drift in range values during 
the warm-up in standard operation, i.e. the stability of the range measurements 
of both Kinects with respect to the operating time. 

Experimental Setup. We accommodate the device in a room with a constant 
temperature of 21°C which is actively controlled by an air conditioning system 
with a variance below 0.1°C. We start to operate the device measuring a planar 
wall at a distance of 1200mm. We acquire 200 frames in a row and drop frames 
for 15s (450 frames) afterwards and repeat this until a total time of 120 minutes. 
During the acquisition a digital thermometer (precision ±0.1°C) records the 
temperature inside the Kinect devices. The temperature in the device interior is 
measured with a fiexible sensor tip inserted through the ventilation holes. Thus, 
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Figure 13: Device Warm Up: Box plot Error mean and temperature versus warm-up time 
(bottom row) for Kinect^^ (left) and Kinect"^®^ (right). 
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Figure 14: Device Warm Up: depth error at minute 60 for Kinect^^ (left) and Kinect^®^ 
(right). 

the devices remained intact in order to keep the original temperature dissipation 
system. 

Evaluation and Results. As the variation in the range data is smaller for the 
cold device than for the warm device, we make a RANSAC fit to the averaged 
first steady sequence of 200 frames, resulting in a reference depth image 
However, for Kinect^^the first steady sequence of 200 frames was captured 
after 10 minutes, as we observe a very strong variation in this initial range of 
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measurements. Nevertheless, since the RANSAC is applied to the whole frame 
there might be some bias to the reference frame. We calculate the RMSE for the 
average of all 200 frames in a frame sequence with respect to the fitted 

plane as 


RMSE = 


\ 


^ m n 

- E E v) - v)f 


m • n 


( 3 ) 


n=l v=l 


Eurthermore, we calculate the per-pixel standard deviation average (SDA) 
for each sequence of K = 200 frames Di. 


SDA = 


^ m n 
— 

1 ■ n ^ ^ \ 

U=1V=1 \ 


1 ^ 

— ^ (A{w, V) - v)f. 


( 4 ) 




The results for the device warm-up test are shown in Eig.[^ The fluctuation 
in the temperature of the Kinect^®^ is due to the cooling system, that gets acti¬ 
vated and deactivated depending on the system temperature. Eor the Kinect^^ 
there is only a small temperature difference of 11°C after 120 minutes. The 
results show that Kinect^°^ has in general less error than the Kinect^^ and SDA 
and RMSE are nearly constant over time. The Kinect^^ has strong error and 
variation fluctuations in the first 10 min of the warm-up phase. After the device 
has reached its operation temperature, the distance SDA stays within 1mm, 
which is slightly better than for the Kinect^°^ which has a distance SDA of 
1.5mm. However, the distance error is higher for the Kinect^^ (RMSE < 7.1mm) 
than for the Kinect^°^ (RMSE < 5.3mm). The distance box plots in Eig. 
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top, 

show less variation for the Kinect^^ than for the Kinect^°^. However, we again 
point out, that the homogeneous appearance of the box-plots for the Kinect^^ 
partially result from the heavy quantization applied in this device. 

Eig. shows the depth error in respect to the fitted plane in absolute signed 
values. The depth images are taken at minute 60, when both devices are at a 
stable temperature. As it can be seen in the depth images the Kinect^®^ delivers 
smoother results with less out of plane errors compared to Kinect^^, which is 
consistent with the RMSE values at minute 60. 
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4.5. Rail Depth Tracking 

Goal. This test scenario primarily addresses the quality of the range data in 
respect with ground truth distances for a planar wall. The test involves the 
linearity including planarity tests as well as the intensity related error. The 
latter applies only for the Kinect^®^ camera. A secondary goal is to give some 
clue about the dependence of the error from the pixel location, therefore we 
evaluate the error at a few different image locations. 

Experimental Setup. The setup comprises a motorized linear rail mounted per¬ 
pendicular to a white wall, which measures distances between 0.5m and 5m at 
a step-size of 2cm. The camera is mounted on the carriage of the rail facing 
perpendicular to the wall. As the wall does not cover the full range image 
for farther distances, we evaluate planarity and linearity of the camera only 
within a region-of-interest including pixels lying on the white flat wall in the full 
distance range. The pixel region of interest for Kinect^^ is (1,1) ? (630,480) and 
(74,4), (502,416) for Kinect^°^. Furthermore, we observe some pixels along a 
line-of-interest from the image center to the top-left corner, which are always 
covering the wall. We acquire 200 frames for each distance. For the evaluation of 
the intensity-related error, the acquisition is repeated with a 5 x 6 checkerboard 
attached to the wall. The checkerboard consists of 10 gray-level rectangles on 
white background, where the gray-level degrades from 100% to 0% black. The 
checkerboard has been printed using a standard laser printer which delivers 
sufficiently proportional reflectivity in the visual and the NIR range. 

In order to re-project the range values into 3D-space, we first estimated 
the camera intrinsics using the well known photometric calibration technique 
from m- Similar approaches have been applied to the Kinect^^ [37] (where 
the laser beam is obstructed and an incandescent lamp is used to highlight the 
checkerboard in order to acquire a reliable NIR image of the calibration rig) and 
for ToF-cameras [33] . 
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Figure 15: Linearity Error for four points along the line of interest: Distance error and SD of 
for Kinect^^ (left) and Kinect"^®^ (right). The corner pixel of the Kinect"^®^ delivers invalid 
depth after about 3100mm, therefore no depth values are given for this range. 

4 . 5 . 1 . Linearity - Evaluation and Results 

The evaluation of the linearity requires a proper measurement of the ground 
truth distances for the range images acquired with the rail system. As a perfect 
orthogonal alignment of the camera towards the wall can not be guaranteed, we 
propose to bypass this problem using photometric methods. Having a complete 
lens calibration of both camera systems (i.e. depth and color intrinsic and 
distortion parameters) and the extrinsic transformation between the High-Res 
color camera and the depth camera, the precise 3D camera position of the depth 
camera can be obtained using a simple black-white checkerboard reference fixed 
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to the planar wall and which we acquire at 10 different rail positions. The 
corresponding 3D positions of the depth camera relative to the reference wall is 
done using the standard method [57]. A 3D line was fitted to these 3D positions 
using a RANSAC statistical approach which gives the robust orientation of the 
linear rail. Finally, knowing the precise displacement of each measurement of the 
linear rail (we use a 2cm step size), the 3D position of the camera can robustly 
be estimated and thus a precise ground-truth of the wall be generated using 
the lens parameters of the depth camera. Having the ground truth distance 
D^^{u^v) for a given camera-to-wall distance d for each pixel, we calculate the 
signed error (SE) for the average of all depth measurements u) at the 

rail system, thus suppressing sensor noise 



(5) 


within the region-of-interest consisting oi k x I pixel. For some pixels along the 
line-of-interest we also evaluate the individually signed linearity errors. 

Furthermore we calculate the variance for each pixel in the region-of-interest 
using the 200 frames Di^d taken for each camera-to-wall distance d in order to 
retrieve the standard deviation average SDA according to Eq. 

Fig.[^ top, shows the signed linearity errors for both Kinects for the selected 
pixels with some box plots superimposed. As can be seen in Fig. p!5] right, the 
Kinect^®^ delivers more precise range data than the Kinect^^, if the corner pixel 
is not taken into account. In the proposed work range of the Kinect^^ below 
3m the error lies in the range of [—34, —1.5]mm for the best (central) pixel and 
of [2.5, 76]mm for the worst (peripheral) pixel. The SD for the Kinect^^ below 
3m is very similar for all pixels and is below 30mm. Above 3m the distance 
error of the Kinect^^ strongly increases for peripheral pixels. Even though 
there seem to be some fluctuations in the distance error for the Kinect^°^, this 
effect is much smaller and much less regular than the “wiggling”-error observed 
so far for ToE-cameras [asuaH- Eor pixels not in the extreme periphery the 
absolute per-pixel distance error and the SD lies in the range of [129, —34] mm 
and [0.4,14]mm, respectively. Eor the corner pixel the SD range increases to 
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Figure 16: Planarity Error. The standard deviation of the pixels within the region-of-interest 
for the Kinect^^ (red) and the Kinect"^®^ (blue). Additionally, the theoretical random error 
deduced by Khoshelham and Elberink m for the Kinect^^ is shown. 

[0.4, 28]mm. 

4 . 5 . 2 . Planarity - Evaluation and Results 

The region-of-interest in each range image acquired using the rail lies on a 
planar wall, so the resulting range measurements should ideally result in a plane. 
Similar to Khoshelham and Elberink [26], we apply a RANSAC plane fitting 
method to avoid outliers and calculate the standard deviation of the points from 
the fitted plane as planarity error. 

Fig. [T6| shows the planarity error as SD for both Kinect and the theoretical 
random error deduced by Khoshelham and Elberink [26] for the Kinect^^. The 
Kinect^^ delivers much stronger out-of-plane errors than the Kinectwhich 
stays below 1.65mm for the whole range of 4m. The curve for the Kinect^^ is 
roughly within the expected range. Compared to Khoshelham and Elberink [26] 
we observe an additional fluctuation which can be explained by the decreasing 
depth resolution of the Kinect^^ which leads to a significant depth quantization 
for increasing distances. 
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4.5.3. Intensity (Kinect^^^ only) 

Similar to Lindner and Kolb [35] we evaluate the planar checkerboard with 
varying gray-levels at Im distance. For this scenario we select horizontal pixel 
lines across the gray-level rectangles and directly plot the distance values for 


several distances to the wall. Fig. 17 shows, that the Kinect^®^ delivers very 
stable results and the range error for the darkest rectangle is max. 3mm, 
compared to the white reference distance. Compared to earlier ToF-camera 
prototypes, for which range errors up to 50mm have been observed [35|, this is a 
significant improvement of quality. 


4 . 6 . Semitransparent Liquid 

Goal. This test scenario is designed in order to evaluate the effects of translucent, 
i.e. semitransparent and scattering material on the quality of the acquired object 
geometry. 



100 % 90 % 80 % 70 % 


Figure 17: Intensity Related Error for Kinect^®^. Measured depth and intensity versus the 
actual intensity of the checker board at about 1 meter distance. The intensity is given in 
arbitrary digital units (adu) as delivered by the Kinect"^®^. 
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Figure 18: Semitransparent Liquid. Depth error and amount invalid pixels versus the trans¬ 
parency of the liquid for the Kinect^^ (left) and the Kinect^®^ (right). Samples in standard 
cuvette are shown in bottom row. 


Experimental Setup. Similar to Hansard et al. m we use a sequence of semi¬ 
transparent liquids, i.e. a plastic cylinder filled with diluted milk. The cylinder 
has an inner and outer diameter of 77 and 79mm, respectively. By diluting the 
milk with the same amount of water in each step, we get sequence of 10 objects 
with an amount of 2“^, /c = 0,..., 9, i.e. 100%,... ,0.19% milk. We acquire 200 
frames for each setup. The cylinders are acquired from frontal view at a distance 
of 1.2m. 

Fig. bottom, the visual appearance of the milk probes is shown. The 
diluted milk is filled in cuvettes of 1cm square cross section and placed in front 
of a checkerboard in order to demonstrate the degree transparency in the visual 
range. 

In order to provide a quantitative transparency degree, we measured the light 
penetration through the cuvettes at 850nm. The measured intesity through a 
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cuvette filled with water was the reference 


(lo) 

and each sample was divided by the reference. The Kodak Wratten 850nm filter 
explained at |4.2| was applied to filter visible light. 

Penetration@s50nm = ^ x 100, 

Evaluation and Results. Same as Hansard et al. [18] we directly measure the 
signed error in depth for a manually segmented region in the range image with 
respect to the mean image as a function of transparency by comparing 

against a reference measurement with a non-transparent cylinder of the same 
size; see Eq. (§. Furthermore, we plot the number of invalid pixels. 

As can be seen in Fig 18, top-left, Kinect^^ performs very well for liquid 


samples with more than 3.12% milk, with almost no invalid pixels and a signed 
error in the range of [1,1.5] mm, which is around the thickness of the plastic 
cylinder. However, for the samples with concentration of milk below 3.12%, the 
number of invalid pixels increases dramatically to above 90% and the depth error 
of the remaining valid pixels is increasing as well. For the same experiments the 
Kinect^®^ shows a positive distance error between 12 and 378mm, but does not 
mark any measurements as invalid, i.e. the number of invalid pixels is negligible; 
see Figjl^ top-right. However, for the samples thinner than 3.1%, the number 
of invalid pixels increases dramatically to above 90% for the Kinect^^ while 
Kinect^®^ still delivers valid pixels with rising error up to 400mm for 0.2% milk. 
In conclusion, Kinect^^ performs good for thicker semitransparent liquids and 
indicates failure for the thinner cases. On the other hand, using the Kinect^°^ 
is much harder, as the device does not indicate the pixel’s invalidity even for a 
large amount of distance error. 


4 . 7 . Reflective Board 

Goal This test evaluates the impact of strongly reflecting objects which poten¬ 
tially result in erroneous depth measurements mainly due to multi-path effects. 
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Figure 19: Reflective Board. Sample range image with imposed pivot-line (left) and a photo of 
the setup with unrolled screen (middle) and coiled curtain (right). 

Beside the reflectivity as such, the multi-path effect strongly depends on the 
orientation of the reflective object towards other bright objects in the scene and 
the camera. Therefore, we are mainly interested in the relation between the 
angular orientation and measured depth error. 

Experimental Setup. We use a common whiteboard of 60 x 40cm size as reflective 
object and place it vertically on a turning table in front of a white projector 
screen at a distance of 170cm from the camera. The projector screen can be 
rolled up and behind that there is a non-reflecting black curtain in order to 
make a non-multi-path reference measurements; see Fig. middle and right. 
The rotating vertical board is placed in front of the Kinect camera so that the 
board rotates around the pivot line which intersects the center of the rotating 
table. The points lying on the pivot line remain at the same distance to the 
camera. The rotation starts from 0° to 90° with resolution of 0°15'. The specific 
multi-path effect depends on the board angle. For each step we acquire 20 
frames. 

Evaluation and Results. For each acquired pair of range images, i.e. for a given 
fixed angle, with a coiled and unrolled screen, we select a vertical 4 x 100 pixel 
region of interest around the rotation pivot on the whiteboard. For each pixel in 
the vertical region of interest we assume a constant distance to the camera and 
a constant multi-path situation. Within this region we compute the RMSE with 
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Figure 20: Reflective Board. The SD and the amount of invalid pixels versus the angle of 
incidence for the non-multi-path and the multi-path situation for the Kinect^^ (left) and the 
Kinect^®^ (right). 


respect to the reference measurement at 90° and the SD of the measurement 
itself as a function of the incident angle. Furthermore, we calculate the relative 
number of invalid pixels. 


In Fig. W the RMSE and the SD for all acquisition angles for the Kinect^^ 
(left) and the Kinect^°^ (right) are plotted. Additionally, we plot the amount of 
invalid pixel. As expected, the Kinect^^ has much less problem with this indirect 
lighting setup, since the structured light principle does not get confused by 
diffuse scattered light. However, the Kinect^^ has also limitations for low angles 
and delivers a higher invalid depth for low incident angles. Even though the 
measurement principle should not get affected by this. One simple explanation 
would be, that too little light is getting reffected to the camera, however, this 
would also be true for the reference measurement with coiled screen. 

Eor incident angles below 10° up to 100% of the pixels are marked invalid. 
For angles above 15°, the Kinect^^ yields nearly no invalid pixels and the depth 
error is close to zero. The Kinect^°^, on the other hand, has a lot more problems 
with the superposition of the indirect illumination, i.e. the multi-path situation. 
Apparently, the Kinect^°^ is able to detect some of the corrupted pixel, but at 
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Figure 21: The “Siemens Star”. Mechanical details (left) and the segments as well as the 
pixels used for the evaluation are marked in red (right). 


angles below 10° which get affected by a low incident angle, are not classified as 
invalid, resulting in extremely range errors up to 800mm. For incident angles 
between 10° and 30° the Kinect^°^ delivers up to 100% invalid pixel. Similar as 
for the Kinect^^, the Kinect^°^ range values are again more reliable for angles 
above 35°, i.e. no invalid pixels are delivered with a depth error below 50mm. 

^.8. Turning Siemens Star 

Goal. The performed test targeted at measuring the amount of flying pixels, 
i.e.pixels that cover an inhomogeneous region in terms of depth and thus do not 
deliver proper depth values, for static and dynamic scenes. Both Kinect cameras 
mark unreliable pixels as “invalid”, which also applies for the flying pixels. 

Experimental setup. Similar to Lottner et al. [36] we manufactured a 3D Siemens 
star. However, we mount it to a stepping motor that actuates the star in a 
controlled fashion in front of a planar background wall at 1.8m distance. The 
geometrical dimensions of the Siemens star are shown in Fig. left. We apply 
different angular velocity while capturing range data with any of the Kinect 
cameras. As the both Kinects have different intrinsic parameters and different 
range image resolution, there are different options for the geometric setup of 
the measurement. We opted for a setup where each sensor has the same “pixel 
coverage” on the star, thus the cameras have different distances to the star during 
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Figure 22: Analysis for the Turning Siemens Star. The minimal, mean and maximal relative 
numbers of foreground, background and invalid pixels plotted over the angular velocity for the 
Kinect^^ (left) and the Kinect^®^ (right) for the segments “SO” (top) and “SI” (bottom). 

the acquisition, but as both cameras have approximately the same temporal 
resolution, i.e. frame rate, pixel coverage for the lateral motion is comparable 
for both Kinect. 

We acquire range data for the static and the dynamic wheel. We have chosen 
nine velocity steps between 0,11, 22,..., lOORPM. In our setup lOORPM relates 
to 35 pixel swept in the most outer circle (red arc in Fig[^ right) by the wheel 
within one range image, i.e. in l/30s. 
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Evaluation and Results. In the evaluation we account for the fact, that the 
Kinect’s illumination units are mounted horizontally for both cameras, leading 
to different shadowing effects at vertical and horizontal edges. Thus we expected 
varying results between regions with predominantly vertical and horizontal edges 
and performed the analysis separately for the two wheel quarters, one at the 
right (“SO”) and one at the top (“SI”). For the evaluation we use pixels at 
a circular arc at the outer part of the wheel illustrated by the red arc; see in 
Fig|2TJ right. 

In an ideal case, along the arcs there should be 50% foreground and 50% 
background pixels. Therefore we simply calculate the minimal, mean and 
maximal relative numbers of foreground, background and invalid pixels for the 
different speed values. 

Fig. shows the results of the foreground-background analysis for both 
cameras and both segments. One first insight is, that the classification results 
are very stable, namely the Kinect^®^ shows very little variation in its results. 
Comparing the classification results for the static scene, i.e. the fiying pixels, 
the foreground classification is nearly perfect, i.e. 52.5% for the Kinect^^ and 
53.2% for the Kinect^®^ for segment “SO”. The amount of invalid pixel for this 
segment is 7.0% for the Kinect^^ and 10.9% for the Kinect^®^. 

For increasing speed, it is apparent that the Kinect^^ delivers less invalid 
pixels and more (false) foreground pixels , resulting in 100% foreground pixels 
for lOORPM. The behavior of the Kinect^®^ is much more reliable. As one would 
expect, the number of invalid pixels increases for higher speed and the number 
of foreground and background pixel decreases in a comparable way. However, 
there are always more foreground than background pixel. This effect can be 
explained by the shadowing which only applies to the background, i.e. the holes 
in the Siemens star. 

As expected, there are differences between the two arc segments for both 
Kinect. In general, the results for the top segment “SI” are worse for both 
devices, as shadowing effects are stronger for vertical edges. For the Kinect^^ 
mainly the number of invalid pixels is higher for lower speed, which is counter- 
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Figure 23: Range Profile for the Turning Siemens Star for the angular speed of ORPM (left) 
and 60RPM (right) measured by Kinect^®^. Red indicates invalid pixels. Note, that the range 
between 4.3 — 5 radiant is the lower part of the wheel where the tripod distorts the background 
region. 


intuitive. For the Kinect^°^ the differences between the two segments are less 
prominent. 


Beside the classification for the pixels along the arc, Fig. 23 shows the range 
profile for ORPM and 60RPM for the full outer pixel circle. This profile plot 
shows an additional range distortion effect at the edges of the foreground parts. 
Here, additional “overshooting” effects occur, which are due to motion artifacts 
apparent to ToF cameras; see Sec. |2.3| 


5. Conclusion 

This paper presents an in-depth comparison between the two versions of 
the Kinect range sensor, i.e. the Kinect^^, which is based on the Structured 
Light principle, and the new Time-of-Flight variant KinectWe present a 
framework for evaluating Structured Light and Time-of-Flight cameras, such as 
the two Kinect variants, for which we give detailed insight here. Our evaluation 
framework consists of seven experimental setups that cover the full range of 
known artifacts for these kinds of range cameras. 
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Kinect^^ 

Kinect^°^ 

Ambient Background Light (Sec. 

4.2 

) 

Below lyuW: IP=0%, SD<6mm 

Above lAtmW: IP=100% 

Below l/xmW: IP=0%, SD<llmm 

Below lO/xmW: IP=0%, SD<30mm 

At 20/imW: IP=0%, SD<42mm 

Multi-Device Interference (Sec. 

4.3) 


IP: <16.3% (evenly distributed) 

Frame <400: RMSE w/o interf. <6.7mm, 

RMSE w interf. <7.7mm 

Rest of frames: RMSE w/o interf.<5.8mm, 

RMSE w interf.<9.4mm 

IP: <22.7% (repetitively blocked) 

RMSE w/o interf. <4.6mm 

RMSE w interf. < 19.3mm, 

Temperature Drift (Sec. 4.4 


Before 10’: RMSE G [4; 4.6]mm, SDA G 

[0.6; 1.8]mm 

After 10’: RMSE rising from 4.0 to 7.1mm, 

SDA G [0.6; 1.0]mm 

RMSE e [4.6; 5.3]mm & SDA e [1.3; 1.5]mm 

Linearity Error (Sec. 4.5.1); Pt^^l (center)... Pt7^4 (corner); Kinect^^ below 3m 

SEle [-34;-1.5], SE2e [-6.5;62]mm, 

SE3€ [-4;129]mm, SE4€ [2.5;76]mm 

SDl-3e [0.4;14]mm, SD4e [0.4; 28]mm 

SEIg [-8;29]mm SE2 g [-8; 17]mm, SE3 g 

[—8;37]mm, SE4 g [—69;62]mm 

SD1-3G [0.8;6.8]mm, SD4 g [1.8;90]mm 

Systematic Error: Planarity (Sec. 

4.5.: 

2) 

SD(< 1.5m)G [1.2;4.8]mm, 

SD(< 2.5m)G [2.7; 16.6]mm, 

SD(< 3.5m)G [7.5;30.9]mm 

SD<1.65mm 

Intensity Related Error (Sec. 4.5.3) 


-not applicable- 

distance error <3mm@ Im distance 

Semitransparent Media & Scattering (Sec 

. 4.6) 

Light Penetration > 80%: 

SEe [l;1.5]mm, IP<5% 

SEe [17.89; 378.1]mm, IP<2% 
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Kinect^^ 


Kinect'^°^ 


Multipath Effect (Sec. 4.7) 


Incid. angle <13°: IP>90%, 

Incid. angle >20°: IP<1%, Err. <5mm 


Incid. angle <10°: IP<70%, Err.>600mm 
Incid. angle G [10°, 30°]: IF^ 100% 

Incid. angle >30°: IP<2%, Err. <4nini 


Depth Inhomogeneity and Dynamic Scenery (Sec. 


4.8) 


Seg. “SO”: 

EG 

from 

52.5 to 99.48%, 

Seg. “SO”: 

FG 

from 

53.2 

to 

44.8%, 

SD<5.9mm, 

BG 

from 

43.0 to 0.5%, 

SD<1.3mm, 

BG 

from 

35.9 

to 

19.5%, 

SD<6.0mm, IP from 7 to 0%, SD<0.9mm 

SD<1.5mm, 

IP 

from 

10.9 

to 

35.6%, 

Seg. “SI”: 

FG 

from 

52.33 to 100%, 

SD<2.1mm 






SD<5.0mm, 

BG 

from 

19.4 to 0%, 

Seg. “SI”: 

FG 

from 

57 

to 

41.11%, 

SD<3.5mm, 

IP 

from 

28.1 to 0%, 

SD<1.4mm, 

BG 

from 

36 

to 

18.3%, 

SD<3.1mm 




SD<1.6mm, 

IP 

from 

6.8 

to 

40.4%, 





SD<1.5mm 







Table 4: Summarizing the major Kinect characteristics. IP=”invalid pixel”, SD=”standard 
deviation”, RMSE=”root mean square error”, SDA=”standard deviation average”, SE=”signed 
error”, BG=”background”, EG=”foreground”. 
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Table 5: Device failure ratios for two application modes for the major error sources discussed 
in this paper. A ratio of 1 indicates equal behaviour of both devices, values close to 0 and 
infinty indicate high relatively failure for Kinect^^ and Kinect^®^ respectively. The color 
intensity indicates deviations from 1, i.e. cases where both devices behave differently. 
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Device Selection Hints 

Since device selection is highly application dependent, Tab. sets up some 
rules of thumb to help users to make a more profound decision on which device 
to select depending on their application circumstances. The table compares 
device performance in different conditions with respect to the main error sources 
discussed above. For each source, we define two modes of operation, i.e. two 
ranges of application parameters. For each mode, we state a failure weight 
which is the performance ratio for both Kinect cameras. The individual failure 
values are deduced by combining error values and the number of valid pixels 
for each mode. Based on the individual failure values we compute the ratio 
between the Kinect^^and the Kinectfailure, whereby a ratio close to 1 means 
that both devices perform quite similarly, whereas values close to 0 or close to 
infinity indicate, that Kinectand Kinecthave relatively high failure rates, 
respectively. For a specific application scenario, the user selects relevant error 
sources and by multiplying the failure ratios, the overall failure ratio is computed. 
If this final ratio is smaller than 1, Kinect^^would be the best choice, otherwise 
Kinectis preferable. Of course, this is only a very coarse but quick guideline 
resulting in a first suggestion. The user should in any case have a further look at 
the details for the error sources that are most relevant to the specific application. 

Note, that we dropped the error sources “Intensity Related Error” and 
“Multi-Device Interference” from Tab. because the “Intensity Related Error” 
applies only to Kinectand has, compared to prior ToE devices only very 
little impact. Eurthermore, if “Multi-Device Interference” is essential to the 
application, further actions need to be applied, such as using “Shake‘n’Sense” in 
case of Kinect^^[7] or different modulation frequencies in case of the Kinect 
Example 1: User A requires a depth sensing device for indoor scene recon¬ 
struction where the scene has static semi reflective surfaces at high angles: 
Failure ratio = 0.55 x 4 x 0.16 x 0 x 7 x 5.45 x 0.87 = 0 
Therefore Kinectwould absolutely fail in this application. 

Example 2: User B requires face gesture recognition at 1.2 meter distance in 
indoor office conditions: 
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Failure ratio = 0.55 x 4 x 0.16 x 1 x 7 x 5.45 x 0.87 = 11.68 
As the failure ratio is more than 1, user B should choose Kinect^°^ for his 
application. 

Open Science 

We have prepared a website to make the following material publicly available: 

1. A documented version of the evaluation scripts for all experiments written 
in Mat lab. 

2. Further technical details for setting up the required test scenarios, e.g. a 
CAD file for the Siemens star, intensity and calibration checker board. 

The website is available for use by other researchers at: 
htt p: / / WWW. eg. informat ik. uni-siegen. de / data/Kinect RangeSensing /. 
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